VADER：社交网络文本情感分析库

Original 大邓大邓和他的Python 2022-07-09

VADER(Valence Aware Dictionary and sEntiment Reasoner)是专门为社交媒体进行情感分析的工具，目前仅支持英文文本，大邓在这里推荐给大家使用。大家可以结合大邓的教程

【视频课程】Python爬虫与文本数据分析

，自己采集数据自己进行分析。

VADER情感信息会考虑：

否定表达（如，"not good"）
能表达情感信息和强度的标点符号 (如, "Good!!!")
大小写等形式带来的强调，（如，"FUNNY."）
情感强度(强度增强，如"very" ；强度减弱如， "kind of")
表达情感信息的俚语 (如, 'sux')
能修饰俚语情感强度的词语（'uber'、'friggin'、'kinda'）
表情符号 :) and :D
utf-8编码中的emoj情感表情（ 💘 and 💋 and 😁）
首字母缩略语（如，'lol') ...

VADER目前只支持英文文本，如果有符合VADER形式的中文词典，也能使用VADER对中文进行分析。

安装VADER

pip3 install vaderSentiment

使用方法

VADER会对文本分析，得到的结果是一个字典信息，包含

pos，文本中正面信息得分
neg，文本中负面信息得分
neu，文本中中性信息得分
compound，文本综合情感得分

文本情感分类

依据compound综合得分对文本进行分类的标准

正面:compound score >= 0.05
中性: -0.05 < compound score < 0.05
负面: compound score <= -0.05

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
test = "VADER is smart, handsome, and funny."
analyzer.polarity_scores(test)

运行

{'neg': 0.0, 'neu': 0.254, 'pos': 0.746, 'compound': 0.8316}

这里我们只使用 compound 得分，用更多的例子让大家看到感叹号、俚语、emoji、强调等不同方式对得分的影响。为了方便，我们想将结果以dataframe方式展示

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import pandas as pd
analyzer = SentimentIntensityAnalyzer()
sentences = ["VADER is smart, handsome, and funny.",
"VADER is smart, handsome, and funny!", #带感叹号
"VADER is very smart, handsome, and funny.",
"VADER is VERY SMART, handsome, and FUNNY.", #FUNNY.强调
"VADER is VERY SMART, handsome, and FUNNY!!!",
"VADER is VERY SMART, uber handsome, and FRIGGIN FUNNY!!!",
"VADER is not smart, handsome, nor funny.",
"The book was good.",
"At least it isn't a horrible book.",
"The book was only kind of good.",
"The plot was good, but the characters are uncompelling and the dialog is not great.",
"Today SUX!",
"Today only kinda sux! But I'll get by, lol", #lol缩略语
"Make sure you :) or :D today!",
"Catch utf-8 emoji such as such as 💘 and 💋 and 😁", #emoji
"Not bad at all"
]
def senti(text):
return analyzer.polarity_scores(text)['compound']
df = pd.DataFrame(sentences)
df.columns = ['text']
#对text列使用senti函数进行批处理，得到的得分赋值给sentiment列
df['sentiment'] = df.agg({'text':[senti]})
df

VADER目前只支持英文文本，如果想要对中文文本进行分析，需要做两大方面改动。

对于初学者来说有难度，建议大家不着急的话可以系统学完python基础语法，大概学习使用python数周就能自己更改库的源代码。

首先要将库中的vaderSentiment.py中相应的英文词语改为中文词语

# (empirically derived mean sentiment intensity rating increase for booster words)
B_INCR = 0.293
B_DECR = -0.293
# (empirically derived mean sentiment intensity rating increase for using ALLCAPs to emphasize a word)
C_INCR = 0.733
N_SCALAR = -0.74
#否定词
NEGATE = \
["aint", "arent", "cannot", "cant", "couldnt", "darent", "didnt", "doesnt",
"ain't", "aren't", "can't", "couldn't", "daren't", "didn't", "doesn't",
"dont", "hadnt", "hasnt", "havent", "isnt", "mightnt", "mustnt", "neither",
"don't", "hadn't", "hasn't", "haven't", "isn't", "mightn't", "mustn't",
"neednt", "needn't", "never", "none", "nope", "nor", "not", "nothing", "nowhere",
"oughtnt", "shant", "shouldnt", "uhuh", "wasnt", "werent",
"oughtn't", "shan't", "shouldn't", "uh-uh", "wasn't", "weren't",
"without", "wont", "wouldnt", "won't", "wouldn't", "rarely", "seldom", "despite"]
# booster/dampener 'intensifiers' or 'degree adverbs'
# http://en.wiktionary.org/wiki/Category:English_degree_adverbs
# 情感强度副词
BOOSTER_DICT = \
{"absolutely": B_INCR, "amazingly": B_INCR, "awfully": B_INCR,
"completely": B_INCR, "considerable": B_INCR, "considerably": B_INCR,
"decidedly": B_INCR, "deeply": B_INCR, "effing": B_INCR, "enormous": B_INCR, "enormously": B_INCR,
......
"thoroughly": B_INCR, "total": B_INCR, "totally": B_INCR, "tremendous": B_INCR, "tremendously": B_INCR,
"uber": B_INCR, "unbelievably": B_INCR, "unusually": B_INCR, "utter": B_INCR, "utterly": B_INCR,
"very": B_INCR,
"almost": B_DECR, "barely": B_DECR, "hardly": B_DECR, "just enough": B_DECR,
"kind of": B_DECR, "kinda": B_DECR, "kindof": B_DECR, "kind-of": B_DECR,
"less": B_DECR, "little": B_DECR, "marginal": B_DECR, "marginally": B_DECR,
"occasional": B_DECR, "occasionally": B_DECR, "partly": B_DECR,
"scarce": B_DECR, "scarcely": B_DECR, "slight": B_DECR, "slightly": B_DECR, "somewhat": B_DECR,
"sort of": B_DECR, "sorta": B_DECR, "sortof": B_DECR, "sort-of": B_DECR}
# 不再情感形容词词典中，但包含情感信息的俚语表达（目前英文方面也未完成）
SENTIMENT_LADEN_IDIOMS = {"cut the mustard": 2, "hand to mouth": -2,
"back handed": -2, "blow smoke": -2, "blowing smoke": -2,
"upper hand": 1, "break a leg": 2,
"cooking with gas": 2, "in the black": 2, "in the red": -2,
"on the ball": 2, "under the weather": -2}
# 包含词典单词的特殊情况俚语
SPECIAL_CASE_IDIOMS = {"the shit": 3, "the bomb": 3, "bad ass": 1.5, "badass": 1.5,
"yeah right": -2, "kiss of death": -1.5, "to die for": 3}

然后还要将英文词典 vaderlexicon.txt改为对应格式的中文词典。vaderlexicon.txt格式

TOKEN, MEAN-SENTIMENT-RATING, STANDARD DEVIATION, and RAW-HUMAN-SENTIMENT-RATINGS

引用信息

如果使用VADER词典、代码、或者分析方法发表学术文章，请注明出处，格式如下

Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.
Refactoring for Python 3 compatibility, improved modularity, and incorporation into [NLTK] ...many thanks to Ewan & Pierpaolo.

二湘：朱令去世一周年，清华学子控诉清华在朱令案中的冷血和无耻

李宜雪的良知卖了2万元，真正需要声援的是罗灿宏啊

故意按摩让女生“产生欲望”后发生关系，算性侵吗？

洗牌电商圈！阿哲放话全网：挑战抖音所有机制！爆全品类大牌！

阿哲现身评论区，@一修！肉肉痛哭，无限期停播！回应舆论黑料，关闭私信评论区！

VADER：社交网络文本情感分析库

VADER情感信息会考虑：

安装VADER

使用方法

文本情感分类

更多

引用信息

您可能也对以下帖子感兴趣

二湘：朱令去世一周年，清华学子控诉清华在朱令案中的冷血和无耻

李宜雪的良知卖了2万元，真正需要声援的是罗灿宏啊

故意按摩让女生“产生欲望”后发生关系，算性侵吗？

洗牌电商圈！阿哲放话全网：挑战抖音所有机制！爆全品类大牌！

阿哲现身评论区，@一修！肉肉痛哭，无限期停播！回应舆论黑料，关闭私信评论区！

生成图片，分享到微信朋友圈

VADER：社交网络文本情感分析库

VADER情感信息会考虑：

安装VADER

使用方法

文本情感分类

更多

引用信息

您可能也对以下帖子感兴趣